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Abstract. In this paper we pose the question: After gathering N data points, 
at what value of the control parameter should the next measurement be done? 
We propose an on-line algorithm which samples optimally by maximizing the gain 
in information on the parameters to be measured. We show analytically that the 
information gain is maximum for those potential measurements whose outcome 
is most unpredictable, i.e. for which the predictive distribution has maximum 
entropy. The resulting algorithm is applied to exponential analysis. 
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1. Introduction 

Experiments usually measure the response of a physical system to an input. The 
measured set of outputs together with the corresponding inputs is then used to 
determine the parameters of some assumed model. For instance, for determining 
the electric resistance one would measure the resulting current for several values 
of the applied voltage. The voltage acts as a sort of control parameter. For time 
series the corresponding control parameter would be the time instances at which 
the signal is sampled. 

The most common procedure for measuring physical parameters is to sample 
the underlying physical signal equidistantly with respect to the control parameter. 
However, it is often not clear how the accuracy of the parameters to be measured 
depends on the range over which the control parameter is varied. This is typical 
for signals of the form exp(— t/r), which decay exponentially in the control param- 
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eter t. For instance, if one happens to choose the maximum value of the control 
parameter to be much larger than the unknown decay constant t many samples 
would obviously be "pure noise" , thus reducing the accuracy of the estimate of r. 

The situation is particularly severe for sums of exponentials. There, the sam- 
pling rate clearly determines the smallest resolvable decay constant. Since for 
equidistant sampling those parts of the signal with larger decay constants are 
sampled more often, the estimates become the more accurate the larger the decay 
constant. Hence, a lot of sampling is wasted in order to estimate the easily to 
determine larger decay constants. 

An experimentalist measuring a spectrum of decay constants consequently faces 
the question how to choose the values of the control parameters in order to obtain 
accurate measurements while restricting the maximum number of experiments to 
N . This task is basically one of design of experiment, a discipline well-known in 
statistics jp. Due to the nonlinear dependence on the decay constants this is a 
nontrivial task, however. It would involve an optimization with respect to the 
control parameters of the N measurements to be done. 

In this paper, we take a simplified approach by posing the question: After 
gathering N data points, at what value of the control parameter should the next 
measurement be done? We propose an on-line algorithm which samples optimally 
by maximizing the gain in information on the parameters to be measured. By this, 
the experiment is designed "sequentially" or "on-line" rather than completely from 
the onset. 

Due to its design, the algorithm will choose at each step of the experiment the 
value of the control parameter at which the next experiment should be performed. 
In Section || we choose the entropy of the posterior as an information measure and 
introduce the algorithm for sequential MaxEnt sampling. We show analytically 
that the algorithm reduces to finding that value of the control parameter for which 
the width of the predictive distribution is maximum. 

Section |^ gives an application of the algorithm to exponentially decaying sig- 
nals and Section ^ compares the performance of the algorithm with equidistant 
sampling. In Section || we apply the algorithm to exponentially decaying signals 
having more than one decay constant. Finally, Section |^ summarizes the results 
and provides a conclusion. 

2. MaxEnt formulation of sequential sampling 

Assume we have determined a discrete data set Djy = {(ii, s±), (tjvj sat)}, sam- 
pled from s(t) at discrete instances of the control parameter t±, tjv, with a model 
equation 

Si = s(U) = + (1) 

where f(t) is the signal and £ represents noise in the problem. Usually one has 
some parameterized form of f(t) and the aim of the experiment is to determine 
(at least some of) these parameters. Having obtained Djy, the knowledge of these 
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parameters is contained in the posterior distribution 

m ° N) ~ P(D N ) ' (2) 

where (3 denotes the whole set of parameters to be determined. For the prior 
probability of the noise P(C) = P(Dn\(3) we take the least informative prior B, 
i.e. a Gaussian of zero mean and variance a 2 . 

The aim of the experiment is to maximize the information about (3 during the 
course of the experiment. Using the negative entropy of the posterior as a measure 
of information this amounts to minimizing the entropy 

S(D N ) = - J df3P((3\D N ) log P{(3\D N ). (3) 

The most informative experiment would then be defined by those values of the 
control parameter t which satisfy S(D n ) = min, implying 

dS ^ N K Q for all k=l,...,N. (4) 



dt k 



In general this equation will be too difficult to solve. Moreover, the optimization 
(Q) is not advisable, since it relies on a fixed model for f(t). In general, there might 
be different optimal experiments for different models. 

These difficulties can be avoided if one chooses to design the experiment in 
a sequential way. Instead of determining the values of all U one could ask the 
question: After gathering N data points, at what value of the control parameter 
ijv+i should the next measurement be done in order to gain as much information 
as possible about the parameters (3 to be estimated? 

The answer to this is obtained by varying S{Dn+i) = S (Dn U (ijv+i, sjv+i)) 
with respect to the value of the control parameter tjv+i at the next experiment. 
That is, one has to determine the minimum of 

S(t N+1 ) = J ds N+1 S(D N +i)P{sN+i\D N ,tN+i) (5) 

with respect to i/v+i- Note that one has to average out the unknown outcome 
sjv+i of the next experiment using the predictive distribution P(sn+i\Dn, tjv+i). 
The predictive reflects the knowledge about sn+i given the previously obtained 
data Dm and the control parameter of the next experiment. 

Our sequential approach to design optimal experiments is somewhat similar to 
so-called query learning in neural networks. There one searches for such training 
examples that speed up the learning process considerably |||4||5|]. Based on this 
similarity we name the value of the control parameter t^v+i which minimizes ([&]) 
the query t q . 

Applying Bayes rule one can rewrite the r.h.s. of (^|) in the form 



S(t N+ i) = I ds N+ iP(s N+ i\D N ,t N+ i)logP(s N+ i\D N ,t N+ i) 

J df3P{f3\D N ) log P(J3\D N ) + i (1 + log 27T0 3 ) . (6) 
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Note that the only dependence on ijv+i is via the entropy of the predictive distri- 
bution 

(7) 



Spredir) = - J ds N+1 P(s N+1 \D N ,T)\ogP(s N+1 \D N ,T). 



Hence, in order to minimize the entropy of the posterior with a new measurement 
(tq,s(t q )) and N old measurements Dn, one has to maximize the entropy of the 
predictive distribution. The reason for this is that t q is that value of the control 
parameter for which the prediction is least secure. Since the derived sampling 
procedure is based on a maximum entropy criterion, we call it sequential MaxEnt 
sampling. 

Two notes are in place here, concerning the implementation of the above algo- 
rithm: According to (^J), only P(sn + i\Dn, tpj + i) is needed in order to determine 



t q . By definition 



P{sn+i\Dn, tN+l) 
For model functions of the form 



P{D 



N+l) 



P{D N ) 



J d(3P{f3\D N+1 ) 
Jd(3P(f3\D N ) ■ 



(8) 



/(*) = £V? 3 -(*,{A}) 



(9) 



the integration with respect to the model parameters f3 = Aj, A, a can be simplified 
if one treats the parameters Aj as nuisance parameters, see Q for details. In the 
same manner, the variance of the noise in ([!]) can be integrated out analytically 
using Jeffreys prior. Hence, the only parameters that actually need to be integrated 
over in (||) are the A, which enter f(t) in a nonlinear manner, in general. 

For sufficiently large values of N the predictive distribution can be well ap- 
proximated by a Gaussian of width <7 pre d- The most probable value of s(ijv+i) will 
then be (27rcTp red ) -1 / 2 , which depends on tjv+i via a pre d- Maximization of S pre d 
then amounts to searching for that value of t^+i for which the most probable 
value of s(iAr-i-i) is minimum, see Figure^. 



3. Application to a monoexponentially decaying signal 

As a first example we apply the proposed sequential MaxEnt sampling procedure 
to an exponentially decaying signal with only one decay constant: 

s(t) = Aexp(-At) + £- (10) 

As before, £ denotes additive noise of variance a. 

It is obvious that the sampling technique proposed in Section ^| cannot be 
applied from scratch, i.e. without any experimental data at hand. For estimating 
A, for instance, one needs to have at least two measurements at different time 
instances since one needs at least two points to fit a straight line to logs. Here we 
use three initial measurements, because two measurements lead to a posterior of 
width which is difficult to handle numerically. 
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Figure 1. Left: Predictive distribution for various values of tjv+i (left to right: tjv+i = 1-5, 
1.0,0.5,0). In the case shown the next measurement would be performed at t q = 1.1. For this 
value the predictive distribution has the largest width and the corresponding entropy is maximum 
(right). The results have been obtained for a monoexponcntially decaying signal as discussed in 
Section N. 
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Figure 2. Simulation results for the sampling times t q for A = 1, <r = 0.1. The decay constant 
has been chosen to be A = 1 (left) and A = 0.01 (right) 



For the simulation results shown in the following, the required three initial mea- 
surements have been generated by drawing ti, i = 1..3 from a uniform distribution 
in [0, 2], In general, one would perform the initial measurements within an interval 
which is of the order of the decay time r = 1/A. Note that this choice should 
be possible in principle, since an experimentalist will have some prior knowledge 
about the decay constant. However, such a choice is not mandatory, as is exem- 
plified in Figure | There we show simulation results for A = 1 and A = 0.01. In 
the latter case the initial measurements are at times much smaller than the decay 
constant t = 100, yet the sequential MaxEnt sampling algorithm is capable of 
tracking the signal. 

Figure || shows the evolution of sample times t q for increasing length TV of the 
experiment. We find the remarking behavior that the sampling times "oscillate" 
between t q = and t q ~ 1/A. Note that the period is not strictly equal to one, 
since sometimes two or more consecutive measurements are at either t q — or 
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Figure 3. Distribution of sampling times t q ^ for an exponentially decaying signal of form 
(td) with A = 1 and A = 1. The histograms have been obtained for 10 experiments of length 
N = 100. Left: a = 0.1, right: cr = 0.5. 



t q ~ 1/A. The oscillating behavior can be traced back to the functional form of 
S P red(tN+i), cf. Figure |]. The entropy S P red(tN+i) has two maxima for £ > 0, 
the absolute maximum determining t q . By construction of MaxEnt sampling the 
next measurement will be placed at t g i ooa i and S(t g i b a i) will decrease relative to 
S(ti oca i). After one or more measurements the predictive entropy at the hitherto 
global maximum will be lower than at the hitherto local maximum and the two 
maxima will change their roles. 

From Figure || it is evident that the larger sampling times are not exactly at 
t q = 1/A. Instead, the values of t q have some spread which increases with the 
variance a 2 of the additive noise, see Figure |^. 

In the course of the experiment the standard errors of the parameters to be 
estimated decrease with 1/yN, asymptotically, as expected, cf. See Figure |] 
for an example. 

A note is in place here concerning the oscillating behavior of the control pa- 
rameter. For time series, the control parameter t would be time. At first glance, it 
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Figure 4- Standard errors of the estimates of A, A, and a (top to bottom) obtained via sequential 
MaxEnt sampling. Shown are the results for the same simulation as in Figure B. left. 
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might seem impossible to perform sequential MaxEnt sampling since time can only 
be increased but not decreased. For many types of experiments, however, one can 
actually decrease the time of the next measurement by performing a new experi- 
ment. For instance, in measuring NMR relaxation times one would first magnetize 
the probe which defines the onset of the experiment, t = 0, and then measure the 
magnetization for several t > 0. For MaxEnt sampling, one would have to magne- 
tize, measure at t q , magnetize again, and so on. Hence, t q can increase or decrease 
in consecutive experiments. 

4. Comparison with equidistant sampling 

It is instructive to compare the results obtained for sequential MaxEnt sampling 
with other sampling methods. The most common alternative method is probably 
equidistant sampling where the experiments are performed at N equidistantly 
spaced values of the control parameter. For a signal of form ( JlCj ) the lowest of 
these values is obviously t = while the largest t max is up to the choice of the 
experimentalist . 

Clearly, there is an optimal choice of t max . For, choosing t max » 1/A would 
result in sampling of noise only, while t max < 1/A would give a poor estimate of 
A as the noiseless signal Aexp(-Xt) hardly varies in this range. This behavior is 
confirmed by simulations. In Figure [| we show the standard errors of the Bayesian 
estimates of A, A, a for various values of t max . As can be seen, the standard error 
of A increases with t max while the standard error of A has some minimum around 
t max ~ 2/A; the standard error of the estimate of a is basically independent of 

tmax • 

Consequently, in choosing t max there is a tradeoff between estimating A (lead- 
ing to t max — > 0) and A (leading to t max ~ 2/A). Besides this tradeoff, not knowing 
A one could determine t max only based on the prior knowledge on A. In contrast 
to that, sequential MaxEnt sampling, as described in the previous section, does 
not require the setting of any maximum control parameter t max and places its 
measurements automatically. 

In addition, MaxEnt sampling leads to a much better resolution of the parame- 
ters to be estimated. This is exemplified in Figure ^, which compares the obtained 
standard errors of A, A for MaxEnt and equidistant sampling. 

5. Application to sums of exponentials 

Exponentially decaying signals can be found in many physical phenomena as first 
order differential equations are among the most common in physics. However, the 
spectrum {A„, A n } of a multi-exponential signal 

/(t) = ^A„e X p(-A„t) (11) 

n 

is difficult to estimate. The reason is that the required inverse (discrete) Laplace 
transform is known to be an ill-posed problem IqH . A small amount of noise added 



8 



PETER RIEGLER AND NESTOR CATICHA 




2 4 6 8 10 

tmax 

Figure 5. Standard errors of the Bayesian estimates of A, A, it of a signal of form (fic|). Results 
have been obtained for equidistant sampling of N = 100 points between t = and t max , averaged 
over 20 independent runs (same simulation settings as in Figure Q, left). Top to bottom: standard 
errors of A, A, a. Note that depending on the parameters to be estimated there is a different 
optimal choice for t max . The standard error of a is basically independent of t max . 
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Figure 6. Comparison between sequential MaxEnt sampling (dashed line) and equidistant sam- 
pling (solid line). Shown are the standard errors for A (left) and A (right). Simulation parameters 
are for the same settings as in Figure H, averaged over 20 independent runs. Equidistant sampling 
has been performed with t ma x = 2.0. 



to fit) leads to a nonvanishing contribution to the spectrum even in the limit of 
a very small perturbation. 

Moreover, the quality of the spectrum to be estimated from measurements de- 
pends strongly on the values of the control parameter t at which the measurements 
are performed ||. For instance, choosing t > 1/A* certainly rules out the detection 
of any A„ > A* in ([n]). Some good prior knowledge about the values of A„ to be 
expected in the signal would be very helpful. But even such knowledge would leave 
open the question how to choose the sampling times U. For instance, in porous 
materials the magnetization decays approximately like 



fit) = ^2 -J ex P (-A n 2 t) , 

n=l H 



(12) 
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Figure 7. Left: Simulation results for the sampling times t q for a signal of type ([H]) with two 
exponentials. Parameters are Ai = 1, A2 = 1/4, Ai = 1, A2 = 4, a = 0.01. Note that the first 
four measurements are chosen equidistantly in order to start up the experiment. Right: Entropy 
of the corresponding predictive distribution after N = 50 measurements. 



i.e. relaxation times and corresponding amplitudes rapidly decrease like 1/n 2 . How 
does one have to choose the sampling times in order to resolve the spectrum up 
to a desired n max 7 

By construction of the algorithm, sequential MaxEnt sampling provides the 
answer to this question. The choice of n max fixes the model function Given 
some initial data, the best query t q can be computed as described in Section H. 
Figure fj] shows the simulation result for the values of t q for a signal of form ( |l2| ) 

with Umax = 2. 

As in the monoexponential case, the values of the sampled control parameters 
oscillate between t = and the characteristic time constants in the signal, i.e. 1/Ai 
and 1/A 2 in our case. Note that in Figure ^ the signal is also sampled at times 
t > 1/Ax larger than the largest characteristic time constant The reason for this is 
probably due to the fact that we have not imposed any priors on the parameters 
A. In order to rule out the presence of any A < one would clearly have to sample 
at large values of t. 

Figure also depicts the characteristic dependence of the predictive entropy on 
t at+1 ■ As in Section |^ the location of its maxima characterizes the sampled values 
of the control parameter t. 

In Figure || we show the resulting posterior on A (with A4 and a treated as 
nuisance parameters) in order to compare with equidistant sampling. As can be 
seen, sequential MaxEnt sampling leads to lower error bars in the estimates of A. In 
particular the width of the posterior in the large decay constant A2 is considerably 
larger for equidistant sampling. For signals of the form (JT^) this is exactly what 
one would expect since the term with the smaller decay constant Ai is sampled 
more frequently than the term decaying with A2. For sequential MaxEnt sampling, 
however, the latter term is sampled more frequently, cf. Figure |?]. 

Despite the good performance of sequential MaxEnt sampling it has to be 
stressed that it becomes impracticable in its current form for n max > 2. For a low 
number N of experiments it will be difficult to find evidence for the decay constant 
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Figure 8. Contour plots of the posterior P(Ai , A2 |-Dso)- Left: posterior corresponding to the 
MaxEnt sampling experiment of Figure M. Right: posterior as obtained after equidistantly sam- 
pling 50 data points between t = and t max = 1.5. The chosen value of t max represents the 
optimal if one aims at highly accurate estimates of the parameters A. In contrast to monoex- 
ponential sampling, however, the optimal choice of t max hardly depends on the values of A for 
intermediate values of t max 



An max in (P^)- A low number of experiments N will result in a degeneracy in the 
estimates of some of the Ai leading to divergences in the posterior. The remedy here 
would be to incorporate model selection as described in Q into the algorithm. This 
would probably result in an algorithm which starts with the hypothesis n max — 
1 for low N and increases n max as soon as the relative probability for such a 
hypothesis is larger. 



6. Summary 

Starting from the question at what value of the control parameter one should 
perform the next experiment after having performed N measurements, we have 
analytically derived a sequential sampling procedure which maximizes the infor- 
mation on the parameters to be estimated. Looking for the query i^v+i that leads 
to a maximum information gain (minimization of the posterior entropy) led us to 
the conclusion that one should find t q that maximizes the entropy of the predictive 
distribution. The reason for this is that t q is the value of the control parameter 
for which the prediction is least secure. 

We have applied the constructed MaxEnt sampling procedure to the fitting of 
exponential signals. We found that the resulting queries oscillate between the time 
constants present in the signal. The width of the distribution of queries around the 
time constants depends on the variance of noise by which the signal is corrupted. 

For purely monoexponential signals we have compared sequential MaxEnt sam- 
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pling with equidistant sampling and found a considerably better accuracy of the 
estimates of MaxEnt sampling. The application of sequential MaxEnt sampling to 
multiexponential signals is of particular interest because of their frequent appear- 
ance in science and the difficulties associated with the inverse Laplace transform. 
We have motivated that proper sampling helps to perform the inversion task. 
We have exemplified that sequential MaxEnt sampling is capable of tracking the 
spectrum quickly. 

However, so far we have applied the algorithm only to an assumed constant 
number of exponentially decaying terms. This is a bit unrealistic, since typically 
there would be infinitely many such terms, like the NMR signals mentioned in 
Section Moreover, the assumption of a large fixed number of exponential terms 
in the signal leads to degeneracies and hence divergences in the posterior. We hope 
that these divergences can be removed by hypothesizing a small number n max 
of exponential terms at the onset of the experiment and increasing this number 
whenever the relative probability favors the hypothesis of a larger value of n max . 

Finally we note that due to the nonlinear dependence of the model functions 
on the decay constants A results could only be obtained numerically. For models 
completely linear in the model parameters it is possible to do the calculation of 
query times t q analytically fl(i[| , however. For such a case, the result is rather 
trivial, leading to t q — > oo. 

The authors are grateful to S. Rabbani and C. Mendonga for stimulating dis- 
cussions. P.R. acknowledges the hospitality extended to him at Universidade de 
Sao Paulo where this work has been initiated. 
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